1,574 research outputs found
The Generalized Spike Process, Sparsity, and Statistical Independence
A basis under which a given set of realizations of a stochastic process can
be represented most sparsely (the so-called best sparsifying basis (BSB)) and
the one under which such a set becomes as less statistically dependent as
possible (the so-called least statistically-dependent basis (LSDB)) are
important for data compression and have generated interests among computational
neuroscientists as well as applied mathematicians. Here we consider these bases
for a particularly simple stochastic process called ``generalized spike
process'', which puts a single spike--whose amplitude is sampled from the
standard normal distribution--at a random location in the zero vector of length
\ndim for each realization.
Unlike the ``simple spike process'' which we dealt with in our previous paper
and whose amplitude is constant, we need to consider the kurtosis-maximizing
basis (KMB) instead of the LSDB due to the difficulty of evaluating
differential entropy and mutual information of the generalized spike process.
By computing the marginal densities and moments, we prove that: 1) the BSB and
the KMB selects the standard basis if we restrict our basis search within all
possible orthonormal bases in ; 2) if we extend our basis search
to all possible volume-preserving invertible linear transformations, then the
BSB exists and is again the standard basis whereas the KMB does not exist.
Thus, the KMB is rather sensitive to the orthonormality of the transformations
under consideration whereas the BSB is insensitive to that. Our results once
again support the preference of the BSB over the LSDB/KMB for data compression
applications as our previous work did.Comment: 26 pages, 2 figure
How can we naturally order and organize graph Laplacian eigenvectors?
When attempting to develop wavelet transforms for graphs and networks, some
researchers have used graph Laplacian eigenvalues and eigenvectors in place of
the frequencies and complex exponentials in the Fourier theory for regular
lattices in the Euclidean domains. This viewpoint, however, has a fundamental
flaw: on a general graph, the Laplacian eigenvalues cannot be interpreted as
the frequencies of the corresponding eigenvectors. In this paper, we discuss
this important problem further and propose a new method to organize those
eigenvectors by defining and measuring "natural" distances between eigenvectors
using the Ramified Optimal Transport Theory followed by embedding them into a
low-dimensional Euclidean domain. We demonstrate its effectiveness using a
synthetic graph as well as a dendritic tree of a retinal ganglion cell of a
mouse
Sparsity vs. Statistical Independence in Adaptive Signal Representations: A Case Study of the Spike Process
Finding a basis/coordinate system that can efficiently represent an input
data stream by viewing them as realizations of a stochastic process is of
tremendous importance in many fields including data compression and
computational neuroscience. Two popular measures of such efficiency of a basis
are sparsity (measured by the expected norm, ) and
statistical independence (measured by the mutual information). Gaining deeper
understanding of their intricate relationship, however, remains elusive.
Therefore, we chose to study a simple synthetic stochastic process called the
spike process, which puts a unit impulse at a random location in an
-dimensional vector for each realization. For this process, we obtained the
following results: 1) The standard basis is the best both in terms of sparsity
and statistical independence if and the search of basis is
restricted within all possible orthonormal bases in ; 2) If we extend our
basis search in all possible invertible linear transformations in , then
the best basis in statistical independence differs from the one in sparsity; 3)
In either of the above, the best basis in statistical independence is not
unique, and there even exist those which make the inputs completely dense; 4)
There is no linear invertible transformation that achieves the true statistical
independence for .Comment: 39 pages, 7 figures, submitted to Annals of the Institute of
Statistical Mathematic
Two types of well followed users in the followership networks of Twitter
In the Twitter blogosphere, the number of followers is probably the most
basic and succinct quantity for measuring popularity of users. However, the
number of followers can be manipulated in various ways; we can even buy
follows. Therefore, alternative popularity measures for Twitter users on the
basis of, for example, users' tweets and retweets, have been developed. In the
present work, we take a purely network approach to this fundamental question.
First, we find that two relatively distinct types of users possessing a large
number of followers exist, in particular for Japanese, Russian, and Korean
users among the seven language groups that we examined. A first type of user
follows a small number of other users. A second type of user follows
approximately the same number of other users as the number of follows that the
user receives. Then, we compare local (i.e., egocentric) followership networks
around the two types of users with many followers. We show that the second
type, which is presumably uninfluential users despite its large number of
followers, is characterized by high link reciprocity, a large number of friends
(i.e., those whom a user follows) for the followers, followers' high link
reciprocity, large clustering coefficient, large fraction of the second type of
users among the followers, and a small PageRank. Our network-based results
support that the number of followers used alone is a misleading measure of
user's popularity. We propose that the number of friends, which is simple to
measure, also helps us to assess the popularity of Twitter users.Comment: 4 Figures and 8 Table
Harmonic Wavelet Transform and Image Approximation
In 2006, Saito and Remy proposed a new transform called the Laplace Local Sine Transform (LLST) in image processing as follows. Let f be a twice continuously differentiable function on a domain Ω. First we approximate f by a harmonic function u such that the residual component v=f−u vanishes on the boundary of Ω. Next, we do the odd extension for v, and then do the periodic extension, i.e. we obtain a periodic odd function v
*. Finally, we expand v
* into Fourier sine series. In this paper, we propose to expand v
* into a periodic wavelet series with respect to biorthonormal periodic wavelet bases with the symmetric filter banks. We call this the Harmonic Wavelet Transform (HWT). HWT has an advantage over both the LLST and the conventional wavelet transforms. On the one hand, it removes the boundary mismatches as LLST does. On the other hand, the HWT coefficients reflect the local smoothness of f in the interior of Ω. So the HWT algorithm approximates data more efficiently than LLST, periodic wavelet transform, folded wavelet transform, and wavelets on interval. We demonstrate the superiority of HWT over the other transforms using several standard images
- …